在这项工作中,我们解决了4D面部表情生成的问题。通常,通过对中性3D面动画来达到表达峰,然后回到中立状态来解决这一问题。但是,在现实世界中,人们表现出更复杂的表情,并从一个表达式转换为另一种表达。因此,我们提出了一个新模型,该模型在不同表达式之间产生过渡,并综合了长长的4D表达式。这涉及三个子问题:(i)建模表达式的时间动力学,(ii)它们之间的学习过渡,以及(iii)变形通用网格。我们建议使用一组3D地标的运动编码表达式的时间演变,我们学会通过训练一个具有歧管值的gan(Motion3dgan)来生成。为了允许生成组成的表达式,该模型接受两个编码起始和结尾表达式的标签。网格的最终顺序是由稀疏的2块网格解码器(S2D-DEC)生成的,该解码器将地标位移映射到已知网格拓扑的密集,每位vertex位移。通过明确处理运动轨迹,该模型完全独立于身份。五个公共数据集的广泛实验表明,我们提出的方法在以前的解决方案方面带来了重大改进,同时保留了良好的概括以看不见数据。
translated by 谷歌翻译
深度学习的高级面部识别以实现前所未有的准确性。但是,了解面部的本地部分如何影响整体识别性能仍然不清楚。除其他外,面部掉期已经进行了实验,但只是为了整个脸。在本文中,我们建议交换面部零件,以剥夺不同面部零件(例如眼睛,鼻子和嘴巴)的识别相关性。在我们的方法中,通过拟合3D先验来交换从源面转换为目标的零件,该零件在零件之间建立密集的像素对应关系,同时还要处理姿势差异。然后,无缝克隆用于在映射的源区域和目标面的形状和肤色之间获得平滑的过渡。我们设计了一个实验协议,该协议使我们能够在通过深网进行分类时得出一些初步结论,表明眼睛和眉毛区域的突出性。可在https://github.com/clferrari/facepartsswap上找到代码
translated by 谷歌翻译
There is a dramatic shortage of skilled labor for modern vineyards. The Vinum project is developing a mobile robotic solution to autonomously navigate through vineyards for winter grapevine pruning. This necessitates an autonomous navigation stack for the robot pruning a vineyard. The Vinum project is using the quadruped robot HyQReal. This paper introduces an architecture for a quadruped robot to autonomously move through a vineyard by identifying and approaching grapevines for pruning. The higher level control is a state machine switching between searching for destination positions, autonomously navigating towards those locations, and stopping for the robot to complete a task. The destination points are determined by identifying grapevine trunks using instance segmentation from a Mask Region-Based Convolutional Neural Network (Mask-RCNN). These detections are sent through a filter to avoid redundancy and remove noisy detections. The combination of these features is the basis for the proposed architecture.
translated by 谷歌翻译
Explainability is a vibrant research topic in the artificial intelligence community, with growing interest across methods and domains. Much has been written about the topic, yet explainability still lacks shared terminology and a framework capable of providing structural soundness to explanations. In our work, we address these issues by proposing a novel definition of explanation that is a synthesis of what can be found in the literature. We recognize that explanations are not atomic but the product of evidence stemming from the model and its input-output and the human interpretation of this evidence. Furthermore, we fit explanations into the properties of faithfulness (i.e., the explanation being a true description of the model's decision-making) and plausibility (i.e., how much the explanation looks convincing to the user). Using our proposed theoretical framework simplifies how these properties are ope rationalized and provide new insight into common explanation methods that we analyze as case studies.
translated by 谷歌翻译
Generic Object Tracking (GOT) is the problem of tracking target objects, specified by bounding boxes in the first frame of a video. While the task has received much attention in the last decades, researchers have almost exclusively focused on the single object setting. Multi-object GOT benefits from a wider applicability, rendering it more attractive in real-world applications. We attribute the lack of research interest into this problem to the absence of suitable benchmarks. In this work, we introduce a new large-scale GOT benchmark, LaGOT, containing multiple annotated target objects per sequence. Our benchmark allows researchers to tackle key remaining challenges in GOT, aiming to increase robustness and reduce computation through joint tracking of multiple objects simultaneously. Furthermore, we propose a Transformer-based GOT tracker TaMOS capable of joint processing of multiple objects through shared computation. TaMOs achieves a 4x faster run-time in case of 10 concurrent objects compared to tracking each object independently and outperforms existing single object trackers on our new benchmark. Finally, TaMOs achieves highly competitive results on single-object GOT datasets, setting a new state-of-the-art on TrackingNet with a success rate AUC of 84.4%. Our benchmark, code, and trained models will be made publicly available.
translated by 谷歌翻译
In the era of digital healthcare, the huge volumes of textual information generated every day in hospitals constitute an essential but underused asset that could be exploited with task-specific, fine-tuned biomedical language representation models, improving patient care and management. For such specialized domains, previous research has shown that fine-tuning models stemming from broad-coverage checkpoints can largely benefit additional training rounds over large-scale in-domain resources. However, these resources are often unreachable for less-resourced languages like Italian, preventing local medical institutions to employ in-domain adaptation. In order to reduce this gap, our work investigates two accessible approaches to derive biomedical language models in languages other than English, taking Italian as a concrete use-case: one based on neural machine translation of English resources, favoring quantity over quality; the other based on a high-grade, narrow-scoped corpus natively written in Italian, thus preferring quality over quantity. Our study shows that data quantity is a harder constraint than data quality for biomedical adaptation, but the concatenation of high-quality data can improve model performance even when dealing with relatively size-limited corpora. The models published from our investigations have the potential to unlock important research opportunities for Italian hospitals and academia. Finally, the set of lessons learned from the study constitutes valuable insights towards a solution to build biomedical language models that are generalizable to other less-resourced languages and different domain settings.
translated by 谷歌翻译
Machine Learning algorithms have been extensively researched throughout the last decade, leading to unprecedented advances in a broad range of applications, such as image classification and reconstruction, object recognition, and text categorization. Nonetheless, most Machine Learning algorithms are trained via derivative-based optimizers, such as the Stochastic Gradient Descent, leading to possible local optimum entrapments and inhibiting them from achieving proper performances. A bio-inspired alternative to traditional optimization techniques, denoted as meta-heuristic, has received significant attention due to its simplicity and ability to avoid local optimums imprisonment. In this work, we propose to use meta-heuristic techniques to fine-tune pre-trained weights, exploring additional regions of the search space, and improving their effectiveness. The experimental evaluation comprises two classification tasks (image and text) and is assessed under four literature datasets. Experimental results show nature-inspired algorithms' capacity in exploring the neighborhood of pre-trained weights, achieving superior results than their counterpart pre-trained architectures. Additionally, a thorough analysis of distinct architectures, such as Multi-Layer Perceptron and Recurrent Neural Networks, attempts to visualize and provide more precise insights into the most critical weights to be fine-tuned in the learning process.
translated by 谷歌翻译
Efficient localization plays a vital role in many modern applications of Unmanned Ground Vehicles (UGV) and Unmanned aerial vehicles (UAVs), which would contribute to improved control, safety, power economy, etc. The ubiquitous 5G NR (New Radio) cellular network will provide new opportunities for enhancing localization of UAVs and UGVs. In this paper, we review the radio frequency (RF) based approaches for localization. We review the RF features that can be utilized for localization and investigate the current methods suitable for Unmanned vehicles under two general categories: range-based and fingerprinting. The existing state-of-the-art literature on RF-based localization for both UAVs and UGVs is examined, and the envisioned 5G NR for localization enhancement, and the future research direction are explored.
translated by 谷歌翻译
This work is on vision-based planning strategies for legged robots that separate locomotion planning into foothold selection and pose adaptation. Current pose adaptation strategies optimize the robot's body pose relative to given footholds. If these footholds are not reached, the robot may end up in a state with no reachable safe footholds. Therefore, we present a Vision-Based Terrain-Aware Locomotion (ViTAL) strategy that consists of novel pose adaptation and foothold selection algorithms. ViTAL introduces a different paradigm in pose adaptation that does not optimize the body pose relative to given footholds, but the body pose that maximizes the chances of the legs in reaching safe footholds. ViTAL plans footholds and poses based on skills that characterize the robot's capabilities and its terrain-awareness. We use the 90 kg HyQ and 140 kg HyQReal quadruped robots to validate ViTAL, and show that they are able to climb various obstacles including stairs, gaps, and rough terrains at different speeds and gaits. We compare ViTAL with a baseline strategy that selects the robot pose based on given selected footholds, and show that ViTAL outperforms the baseline.
translated by 谷歌翻译
Mitotic activity is a crucial proliferation biomarker for the diagnosis and prognosis of different types of cancers. Nevertheless, mitosis counting is a cumbersome process for pathologists, prone to low reproducibility, due to the large size of augmented biopsy slides, the low density of mitotic cells, and pattern heterogeneity. To improve reproducibility, deep learning methods have been proposed in the last years using convolutional neural networks. However, these methods have been hindered by the process of data labelling, which usually solely consist of the mitosis centroids. Therefore, current literature proposes complex algorithms with multiple stages to refine the labels at pixel level, and to reduce the number of false positives. In this work, we propose to avoid complex scenarios, and we perform the localization task in a weakly supervised manner, using only image-level labels on patches. The results obtained on the publicly available TUPAC16 dataset are competitive with state-of-the-art methods, using only one training phase. Our method achieves an F1-score of 0.729 and challenges the efficiency of previous methods, which required multiple stages and strong mitosis location information.
translated by 谷歌翻译